Error Rate Analysis of Labeling by Crowdsourcing

نویسندگان

  • Hongwei Li
  • Bin Yu
چکیده

Crowdsourcing label generation has been a crucial component for many real-world machine learning applications. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules for the Dawid-Skene (and Symmetric DawidSkene ) crowdsourcing model. The bounds can be applied to analyze many commonly used prediction methods, including the majority voting, weighted majority voting and maximum a posteriori (MAP) rules. These bound results can be used to control the error rate and design better algorithms. In particular, under the Symmetric Dawid-Skene model we use simulation to demonstrate that the data-driven EM-MAP rule is a good approximation to the oracle MAP rule which approximately optimizes our upper bound on the mean error rate for any hyperplane binary labeling rule. Meanwhile, the average error rate of the EM-MAP rule is bounded well by the upper bound on the mean error rate of the oracle MAP rule in the simulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theoretical Analysis and Efficient Algorithms for Crowdsourcing

Theoretical Analysis and Efficient Algorithms for Crowdsourcing by Hongwei Li Doctor of Philosophy in Statistics University of California, Berkeley Professor Bin Yu, Chair Crowdsourcing has become an effective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to a...

متن کامل

Error Rate Bounds in Crowdsourcing Models

Crowdsourcing is an effective tool for human-powered computation on many tasks challenging for computers. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules under the Dawid-Skene crowdsourcing model. The bounds can be applied to analyze many common prediction methods, including the majority voting ...

متن کامل

Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing

Crowdsourcing has become an effective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to aggregate the labels in order to obtain results of high quality. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectat...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model

While crowdsourcing has become an important means to label data, crowdworkers are not always experts— sometimes they can even be adversarial. Therefore, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoret...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013